AITopics | multiple oracle

3c56fe2f24038c4d22b9eb0aca78f590-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 03:56:18 GMT

algorithm, oracle, oracle policy, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)

Add feedback

Policy Improvement via Imitation of Multiple Oracles

Neural Information Processing SystemsDec-23-2025, 23:16:07 GMT

Despite its promise, reinforcement learning's real-world adoption has been hampered by the need for costly exploration to learn a good policy. Imitation learning (IL) mitigates this shortcoming by using an oracle policy during training as a bootstrap to accelerate the learning process. However, in many practical situations, the learner has access to multiple suboptimal oracles, which may provide conflicting advice in a state. The existing IL literature provides a limited treatment of such scenarios. Whereas in the single-oracle case, the return of the oracle's policy provides an obvious benchmark for the learner to compete against, neither such a benchmark nor principled ways of outperforming it are known for the multi-oracle setting. In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles. Using a reduction of policy optimization to online learning, we introduce a novel IL algorithm MAMBA, which can provably learn a policy competitive with this benchmark.

imitation, name change, policy improvement, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Policy Improvement via Imitation of Multiple Oracles

Neural Information Processing SystemsOct-2-2025, 17:37:15 GMT

Despite its promise, reinforcement learning's real-world adoption has been hampered by the need for costly exploration to learn a good policy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)

Add feedback

Review for NeurIPS paper: Policy Improvement via Imitation of Multiple Oracles

Neural Information Processing SystemsJun-2-2025, 12:03:25 GMT

Weaknesses: Highest priority comments are the P0 comments listed below. P0: - I think you should clarify what you mean by "experts". You are allowing the definition of experts to include sub-optimal policies, but is there an extent to which you are allowing them to be suboptimal? I feel like this needs to be clarified. If they can be any policy, then does this not fall more in the domain of off-policy/batch RL, rather than imitation learning.

multiple oracle, neurips paper, policy improvement, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.63)

Add feedback

Policy Improvement via Imitation of Multiple Oracles

Neural Information Processing SystemsOct-10-2024, 00:23:04 GMT

Despite its promise, reinforcement learning's real-world adoption has been hampered by the need for costly exploration to learn a good policy. Imitation learning (IL) mitigates this shortcoming by using an oracle policy during training as a bootstrap to accelerate the learning process. However, in many practical situations, the learner has access to multiple suboptimal oracles, which may provide conflicting advice in a state. The existing IL literature provides a limited treatment of such scenarios. Whereas in the single-oracle case, the return of the oracle's policy provides an obvious benchmark for the learner to compete against, neither such a benchmark nor principled ways of outperforming it are known for the multi-oracle setting.

imitation, multiple oracle, policy improvement, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Active Policy Improvement from Multiple Black-box Oracles

Liu, Xuefeng, Yoneda, Takuma, Wang, Chaoqi, Walter, Matthew R., Chen, Yuxin

arXiv.org Artificial IntelligenceJul-5-2023

Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2306.10259

Country: